# A 0.07 pJ/b/dB 36-Gb/s PAM-3 Receiver Using Inductor-Reused CTLE and One-Tap Loop-Unrolled DFE in 22-nm CMOS

Pin-Yuan Chiu<sup>10</sup> and Shen-Iuan Liu<sup>10</sup>, Fellow, IEEE

Abstract—This paper presents a 36 Gb/s (23.04 GBaud) 3-level pulse amplitude modulation (PAM-3) receiver (RX). The proposed inductor-reused continuous-time linear equalizer (CTLE) uses feedforward and inductive peaking techniques. Additionally, the number of the data slicers is reduced in the PAM-3 receiver with a loop-unrolled decision feedback equalizer (DFE). Furthermore, a baud-rate phase detector (BRPD) is presented. Fabricated in 22-nm CMOS technology, this receiver compensates for a channel loss of 20.5 dB at 11.52 GHz, achieving a bit error rate (BER) of less than 10<sup>-12</sup> with a pseudo-random ternary sequence (PRTS) of 3<sup>7</sup>-1. The measured clock integrated jitter is 267 fs<sub>rms</sub> at 720 MHz, and the retimed data exhibits 10.98 ps<sub>pp</sub> jitter. The overall receiver consumes 51.7 mW, with a calculated energy efficiency of 1.44 pJ/b and a figure of merit (FoM) of 0.07 pJ/b/dB.

Index Terms—Pulse amplitude modulation, continuous-time linear equalizer, loop-unrolled decision-feedback equalizer, baudrate phase detector, 1+D, clock and data recovery, pseudorandom ternary sequence.

#### I. INTRODUCTION

ULTI-LEVEL signaling, especially 4-level pulse amplitude modulation (PAM-4), holds advantages over non-return-to-zero (NRZ) signaling due to lower bandwidth requirements. However, the issues must be considered, such as reduced voltage margin, analog front end (AFE) linearity requirements, increased circuit complexity, and higher vulnerability to inter-symbol interference (ISI). Additionally, when converting the multi-level signaling to a digital binary, optimizing the slicers' reference voltages is essential to improve both the jitter tolerance and the bit error rate (BER). Recently, 3-level pulse amplitude modulation (PAM-3) signaling [1], [2] has been adopted to address the above challenges.

The conventional continuous-time linear equalizer (CTLE) [3] has a trade-off between the high-frequency gain and the bandwidth while the power consumption is limited. In addition, a variable gain amplifier (VGA) is required if the low-frequency gain of the CTLE is reduced to enhance the high-frequency boosting gain. It may increase the power

Received 26 March 2024; revised 17 June 2024, 4 September 2024, and 7 November 2024; accepted 26 January 2025. Date of publication 4 February 2025; date of current version 31 March 2025. This article was recommended by Associate Editor Z. Yuanjin. (Corresponding author: Shen-Iuan Liu.)

The authors are with the Graduate Institute of Electronics Engineering and the Department of Electrical Engineering, National Taiwan University, Taipei 10617, Taiwan (e-mail: lsi@ntu.edu.tw).

Digital Object Identifier 10.1109/TCSI.2025.3536091

consumption and reduce the bandwidth. In this work, an inductor-reused CTLE is presented by modifying the limiting amplifier (LA) [4]. The proposed CTLE adopts the feedforward and current-reduced techniques to enhance the boosting gain and the power efficiency, respectively. By using two capacitor arrays to adjust the high-frequency gain of the CTLE while preserving the low-frequency gain, the low-frequency and high-frequency gains of the proposed CTLE can effectively be decoupled.

For a conventional quarter-rate PAM-3 receiver with a onetap loop-unrolled decision feedback equalizer (DFE), 24 data slicers are required. It not only degrades the bandwidth of the proceeding equalizer but also increases the power consumption. In prior work, the 1+0.5D approach [5] for PAM-4 signaling relies on a high channel loss and necessitates a feedforward equalizer (FFE) in the transmitter to shape the response. Inspired by the timing function for NRZ signaling [6], the proposed quarter-rate PAM-3 receiver uses sixteen data slicers for a one-tap loop-unrolled DFE and four error slicers for a baud-rate phase detector (BRPD). By using the 1+D technique, the required number of data slicers is reduced. It not only reduces the capacitive loadings for the preceding CTLE but also improves the power efficiency. In addition, the loop-unrolled method relaxes the stringent timing requirement and the power-consuming summer in a direct feedback DFE. The digital logic operation can benefit the low power from the 22-nm CMOS technology.

This paper is organized as follows. Section II introduces the analysis of the inductor-reused CTLE. Section III describes the one-tap loop-unrolled DFE and the BRPD. Section IV presents the overall PAM-3 receiver implementation. Section V shows the measurement results. Finally, the conclusion is drawn in Section VI.

#### II. INDUCTOR-REUSED CTLE

# A. CTLE Boost and Tuning

Fig. 1(a) shows the conventional CTLE, composed of a gain stage and an inductive peaking buffer. Since this quarterrate PAM-3 receiver will need twenty slicers, their input capacitances will be considered for this CTLE. By using the half-circuit small-signal model in Fig. 1(b), the transfer function of the CTLE is derived as

$$\frac{V_{out1}}{V_{in1}} = \frac{G_{m1,2}G_{m3,4}R_{D1}(R_{D2} + L_{D}s)}{(R_{D1}C_{P,O}s + 1)(L_{D}C_{L}s^{2} + R_{D2}C_{L}s + 1)}$$
(1)



Fig. 1. (a) Conventional CTLE with an inductive peaking buffer, and (b) its half-circuit small-signal model.

where

$$G_{m1,2} = \frac{1}{\frac{1}{g_{m1,2}} + \left(\frac{R_{S1}}{2}||\frac{1}{2C_{S1}s}||(\frac{R_{S2}}{2} + \frac{1}{2C_{S2}s})\right)}$$
(2)

$$G_{m3,4} = \frac{1}{\frac{1}{g_{m3,4}} + \frac{R_{S3}}{2}} \tag{3}$$

where  $C_{P,Q} = C_P = C_Q$ ,  $g_{m1,2}$  denotes the transconductance of  $M_1 - M_2$ , and  $g_{m3,4}$  denotes the transconductance of  $M_3 - M_4$ .

To improve the high-frequency boosting gain and drive the input capacitances of twenty slicers, the proposed inductor-reused CTLE is shown in Fig. 2(a) using the feedforward and inductor peaking techniques. Its half-circuit small-signal model is shown in Fig. 2(b). The transfer function is derived as (4), shown at the bottom of the page. To consider the low-frequency gains of (1) and (4), they are expressed as

$$\frac{V_{out1}}{V_{in1}} = \frac{V_{out2}}{V_{in2}} = \frac{R_{D1}R_{D2}}{\left(\frac{1}{g_{m1,2}} + \frac{R_{S1}}{2}\right)\left(\frac{1}{g_{m3,4}} + \frac{R_{S3}}{2}\right)}.$$
 (5)

To consider the high-frequency gains of (1) and (4) at the frequency of  $1/(2\pi\sqrt{L_DC_L})$  and assuming  $C_L \gg C_{P,O}$ , they



Fig. 2. (a) Proposed inductor-reused CTLE, and (b) its half-circuit small-signal model.

are be approximately derived as

$$\frac{V_{out1}}{V_{in1}} \approx \frac{g_{m1,2}G_{m3,4}R_{D1}\left(R_{D2} + \sqrt{\frac{L_D}{C_L}}\right)}{2 + R_{D2}\sqrt{\frac{C_L}{L_D}}}$$
(6)

and

 $\frac{V_{out2}}{V_{in2}}$ 

$$\approx \frac{g_{m1,2}G_{m3,4}R_{D1}\left(R_{D2} + \sqrt{\frac{L_D}{C_L}}\left(1 + \frac{R_{D2}}{R_{D1}} + \frac{1}{G_{m3,4}R_{D1}}\right)\right)}{\left(2 - \left(R_{D2} + \frac{G_{m3,4}L_D}{C_{P,Q}}\right)/(R_{D1} + R_{D2})\right) + (R_{D1}||R_{D2})\sqrt{\frac{C_L}{L_D}}},\tag{7}$$

respectively. Dividing (7) by (6), one can have

$$\frac{V_{out2}/V_{in2}}{V_{out1}/V_{in1}} \approx \frac{\left(R_{D2} + \sqrt{\frac{L_D}{C_L}} \left(1 + \frac{R_{D2}}{R_{D1}} + \frac{1}{G_{m3,4}R_{D1}}\right)\right)}{\left(R_{D2} + \sqrt{\frac{L_D}{C_L}}\right)} \times \frac{2 + R_{D2}\sqrt{\frac{C_L}{L_D}}}{\left(2 - \left(R_{D2} + \frac{G_{m3,4}L_D}{C_{P,Q}}\right)/(R_{D1} + R_{D2})\right) + (R_{D1}||R_{D2})\sqrt{\frac{C_L}{L_D}}}.$$
(8)

$$\frac{V_{out2}}{V_{in2}} = G_{m1,2} \frac{L_D \left(1 + G_{m3,4} \left(R_{D1} + R_{D2}\right)\right) s + G_{m3,4} R_{D1} R_{D2}}{L_D C_L s^2 + \left(R_{D2} C_L - G_{m3,4} L_D\right) s + 1 + \frac{L_D C_L \left(R_{D1} + R_{D2}\right) s^2 + \left(R_{D1} R_{D2} C_L + L_D\right) s + R_{D1}}{1/C_{P,Q} s}}$$
(4)

 $\label{table I} TABLE\ I$  Component Parameters for Fig. 1(a) and Fig. 2(a)

| $M_{1,2}$           | 50 um / 60nm | $M_{3,4}$          | 24um / 60nm |  |
|---------------------|--------------|--------------------|-------------|--|
| $g_{\mathrm{m1,2}}$ | 25 mS        | $g_{{ m m}3,4}$    | 24 mS       |  |
| $R_{\mathrm{D1}}$   | 150 ohm      | $R_{\mathrm{D2}}$  | 75 ohm      |  |
| $R_{\rm S1}$        | 400 ohm      | $R_{\mathrm{S3}}$  | 30 ohm      |  |
| $C_{\mathrm{S1}}$   | 200 fF       | $L_{\mathrm{D}}$   | 470 pH      |  |
| $R_{\mathrm{S2}}$   | 1.5 kohm     | $C_{\mathrm{L}}$   | 200 fF      |  |
| $C_{\mathrm{S2}}$   | 200 fF       | $C_{\mathrm{P,Q}}$ | 30 fF       |  |
| $I_1$               | 2 mA         | $I_2$              | 4 mA        |  |



Fig. 3. Simulated AC response of Fig. 1(a) and Fig. 2(a).

Since both terms in (8) are larger than one, it indicates that  $V_{out2}/V_{in2}$  is larger than  $V_{out1}/V_{in1}$  at the frequency of  $1/(2\pi\sqrt{L_DC_L})$ . By using the component parameters in TABLE I, Fig. 3 plots the transistor-level simulated AC responses of Fig. 1(a) and Fig. 2(a). Simulation results reveal that when the same power consumption is applied, the proposed CTLE achieves an additional 7 dB boost at 11.52 GHz compared to Fig. 1(a). The proposed inductor-reused CTLE, while maintaining the same power as the conventional one, can effectively drive heavy capacitive loads of twenty slicers and offer increased high-frequency boosting gain.

To further analyze the proposed CTLE, if  $C_{P,Q} \approx 0$  and  $G_{m1,2} \approx g_{m1,2}$  at high frequencies, (4) can be approximated as

$$\frac{V_{\text{out2}}}{V_{\text{in2}}} \approx \frac{g_{m1,2} \left( L_D \left( 1 + G_{m3,4} \left( R_{D1} + R_{D2} \right) \right) s + G_{m3,4} R_{D1} R_{D2} \right)}{L_D C_L s^2 + \left( R_{D2} C_L - G_{m3,4} L_D \right) s + 1} \tag{9}$$

According to (9), the damping ratio  $\zeta$  is calculated as

$$\zeta = \frac{R_{D2}}{2} \sqrt{\frac{C_L}{L_D}} - \frac{G_{m3,4}}{2} \sqrt{\frac{L_D}{C_L}}.$$
 (10)

Enlarging  $G_{m3,4}$  can achieve a high gain for  $V_{out2}/V_{in2}$  in (9). However,  $\zeta$  in (10) is compromised, which may potentially result in amplitude ringing and undesired ISI. Apart from the feedforward path in the proposed CTLE, it also exhibits positive feedback. To decrease the loop gain and prevent the



Fig. 4. Simulated AC response of Fig. 2(a) versus  $C_{P,Q}$ .

oscillation, a lower  $G_{m3,4}$  can be chosen which may result in a weaker high-frequency compensation. Alternatively, using a larger resistor  $R_{D2}$  will degrade the quality factor of the inductor which helps to prevent the oscillation. To ensure the stability, the appropriate values of  $G_{m3,4}$ ,  $R_{D2}$ , and  $L_D$  are determined by using (10) along with the Cadence-provided component called "diffstbprobe" [7] to verify the loop stability. In this work, by using the component parameters in TABLE I,  $G_{m3,4}$  and  $\zeta$  are calculated as 17.6 mS and 0.35, respectively.

As shown in (3), (7), and (10), the high-frequency gain of the proposed CTLE is affected by the value of  $G_{m3,4}$ , which is sensitive to the process variations. Based on (4) and (5), one can increase  $C_{P,Q}$  to lower the high-frequency gain while keeping the low-frequency gain constant. This is because, when increasing  $C_{P,Q}$ , a portion of the signaling current will flow through  $C_{P,Q}$ , thereby reducing the feedforward gain. This approach simplifies the parameter design by effectively decoupling the low-frequency and high-frequency gains of the proposed CTLE. Fig. 4 shows the transistor-level simulated AC response of the proposed CTLE, where the boosting gain varies from 17.6 dB to 6.3 dB at 11.52 GHz. In this work, two 5-bit digitally-controlled capacitor arrays are employed, and  $C_{P,Q}$  ranges from 30 fF to 270 fF.

#### B. Zero and Pole Locations

According to (4), there are three zeros and five poles. Among them, the first stage in Fig. 2(a) contributes two zeros and two poles, approximated as

$$\begin{cases} \omega_{z1} \approx \frac{-1}{(R_{S1} + R_{S2}) C_{S2}} & \omega_{p1} \approx \frac{-1}{R_{S2} C_{S2}} \\ \omega_{z2} \approx \frac{-1}{R_S C_{S1}} & \omega_{p2} \approx -\frac{1 + g_{m1,2} R_S / 2}{R_S C_{S1}} \end{cases}$$
(11)

where  $R_S$  denotes ( $R_{S1}||R_{S2}$ ). In Fig. 2(a), the zero and pole contributed by  $R_{S2}$  and  $C_{S2}$  are used to compensate for the low-frequency loss [8] at frequencies in the several hundred MHz range. Similarly, the zero and pole contributed by  $R_{S1}$  and  $C_{S1}$  are used to compensate for the high-frequency loss over several GHz. In (4), a zero contributed by the inductor



Fig. 5. Simulated pole and zero loci of Eq. (4) on s-plane as  $C_{P,Q}$  changes from 30 fF to 270 fF.



Fig. 6. (a) PAM-3 eye diagram without ISI and two reference voltages, and (b) nine data levels with ISI and six reference voltages.

is approximated as

$$\omega_{z3} \approx -(R_{D1}||R_{D2})/L_D \tag{12}$$

The remaining three poles in (4) are altered when  $C_{P,Q}$  changes. By using the component parameters in TABLE I, Fig. 5 shows the simulated pole and zero loci of (4) when  $C_{P,Q}$  changes from 30 fF to 270 fF. Through proper design, all the poles and zeros reside in the left-half plane to ensure stability.

# III. LOOP-UNROLLED DFE AND BAUD-RATE PD

# A. Loop-Unrolled DFE

PAM-3 signaling involves three symbols,  $S_n \in \{1, 0, -1\}$  for the  $n^{th}$  sampling. In the absence of ISI, Fig. 6(a) shows the amplitude of the PAM-3 data, expressed as  $S_n \cdot h_0$ , where  $h_0$  denotes the main cursor. To recover the data, two slicers with the reference voltages,  $h_0/2$  and  $-h_0/2$ , are required.

Assuming the higher-order post-cursors are compensated by the equalizers, except for the first post-cursor  $h_1$ . When considering  $S_{n-1}$  and  $h_1$ , the amplitudes of the present PAM-3 data are equal to  $S_{n-1} \cdot h_1 + S_n \cdot h_0$ . For instance, if  $S_{n-1} = 1$ , there are three possible data levels for the present PAM-3 data; i.e.,  $h_1 \pm h_0$  and  $h_1$ . To recover the data from these three levels, two slicers with the reference voltages of  $h_1 \pm h_0/2$  are needed. Since  $S_{n-1}$  may have three symbols, there are nine data levels shown in Fig. 6(b), and six reference voltages of



Fig. 7. A PAM-3 one-tap loop-unrolled DFE.



Fig. 8. Normalized SBR.

 $h_1 \pm h_0/2$ ,  $\pm h_0/2$ , and  $-h_1 \pm h_0/2$  are needed. Based on the above discussions, Fig. 7 shows a PAM-3 one-tap loop-unrolled DFE, composed of six data slicers with six reference voltages, two 3-to-1 MUXes, and two DFFs. Through the previous data (DH[n-1], DL[n-1]), two MUXes generate the present data (DH[n], DL[n]) by selecting two outputs from six data slicers.

To reduce the number of slicers in the PAM-3 loop-unrolled DFE, this work utilizes a timing function [6]. This timing function is expressed as

$$f_{BR}(\tau) = h_{BR}(\tau) - h_{BR}(\tau + T_b) \tag{13}$$

where  $h_{BR}(\tau)$  denotes the single bit response (SBR). The time  $T_b$  is equal to 1/SR, where SR represents the symbol rate of the PAM-3 signaling. Fig. 8 shows the normalized SBR, where the values  $h_{0,BR}$  and  $h_{1,BR}$  are sampled at  $\tau = \tau_0$  and  $\tau = \tau_0 + T_b$ , respectively. When  $h_{0,BR} = h_{1,BR}$ , the timing function in (13) is expressed as

$$f_{BR}(\tau_0) = h_{0.BR} - h_{1.BR} = 0. \tag{14}$$

In addition, the nine data levels in Fig. 6(b) are condensed into five,  $\pm 2h_{0,BR}$ ,  $\pm h_{0,BR}$ , and 0, as shown in Fig. 9. To recover these five data levels, four data slicers with four reference voltages,  $\pm 3h_{0,BR}/2$  and  $\pm h_{0,BR}/2$ , are needed. Note that this is a so-called 1+D technique.



Fig. 9. Nine data levels condense into five and four reference voltages.



Fig. 10. Proposed PAM-3 receiver with a one-tap loop-unrolled DFE.

# TABLE II DATA RECOVERY LOGIC

| $S_{n-1}$ | (DH[n-1], DL[n-1]) | (DH[n], DL[n]) |
|-----------|--------------------|----------------|
| 1         | (1, 1)             | $(DS_1, DS_2)$ |
| 0         | (0, 1)             | $(DS_2, DS_3)$ |
| -1        | (0, 0)             | $(DS_3, DS_4)$ |

Based on Fig. 9, Fig. 10 shows the proposed PAM-3 receiver with the one-tap loop-unrolled DFE. It is composed of four slicers with four reference voltages, two 3-to-1 MUXes, and two DFFs, an error slicer, and a BRPD logic. TABLE II shows the data recovery logic, where the previous data (DH [n-1], DL [n-1]) selects two of the outputs  $DS_{1-4}$  to recover the present data (DH [n], DL [n]). It is important to note that this approach specifically benefits for multi-level signaling and does not offer the similar advantages for NRZ signaling [6].

# B. Baud-Rate PD Logic

To realize the BRPD by using the timing function in (13), the output of the CTLE is approximated as

$$v_{ctle}[n] \approx S_n \cdot h_{0,BR} + S_{n-1} \cdot h_{1,BR}.$$
 (15)

When the sequence  $(S_{n-1}, S_n)$  is (-1, 1) or (1, -1), the polarity of the timing function in (13) can be detected by the

TABLE III TRUTH TABLE OF BRPD LOGIC

| (DH[n-1], DL[n-1]) | (DH[n], DL[n]) | ES[n] | BRPD Logic   |  |
|--------------------|----------------|-------|--------------|--|
| (1, 1)             | (0, 0)         | 1 / 0 | Early / Late |  |
| (0, 0)             | (1, 1)         | 0 / 1 | Early / Late |  |
| All c              | X              |       |              |  |



Fig. 11. (a) Measured channel response, (b) simulated eye diagram, (c) simulated probability difference of Late and Early, and (d)  $CK_n$  samples on  $Zone_1$ .

sign of  $v_{ctle}[n]$ . In Fig. 10, the output of the error slicer is given by

$$ES[n] = sign(v_{ctle}[n]) \tag{16}$$

where  $sign(\cdot)$  is the sign function. TABLE III shows the truth table of the BRPD logic. For instance, when the recovered data sequence (DH[n-1], DL[n-1], DH[n], DL[n]) = (1, 1, 0, 0), ES[n] = 1 or 0 indicates the clock phase is Early or Late, respectively. At the steady state, the BRPD toggles when  $h_{0,BR} = h_{1,BR}$ . This convergence allows the condensation of the nine data levels into five, as shown in Fig. 9, enabling the implementation of the loop-unrolled DFE logic.

Fig. 11(a) shows the measured channel response with 20.5 dB loss at 11.52 GHz. Combining this channel with the CTLE, Fig. 11(b) shows the simulated eye diagram, using a 23.04 GBaud pseudo-random ternary sequence (PRTS) of  $3^{7}-1$  [2]. Fig. 11(c) shows the simulated probability difference of Late and Early versus  $\Delta T_{CK}$ , where  $Pr(\cdot)$  denotes the probability function. The timing error  $\Delta T_{CK}$  is defined as the timing difference between the ideal locking point and the  $n^{th}$ clock  $CK_n$ . The ideal locking point resides at the sampling time with  $h_{0,BR} = h_{1,BR}$ . For a small  $|\Delta T_{CK}|$  in Fig. 11(c), the probability difference of Late and Early is approximately  $\pm 0.22$ . For a large  $|\Delta T_{CK}|$ , the Mueller-Müller PD (MMPD) [9] may exhibit multiple false-locking points in multi-level signaling [10]. This phenomenon leads to a degraded BER. However, the proposed BRPD features a single locking point. To explain it, two zones are defined with respect to  $\Delta T_{CK}$ . The first zone  $Zone_1$  is within -0.5 UI  $< \Delta T_{CK} < -0.23$  UI and the second one  $Zone_2$  is within 0.23 UI  $< \Delta T_{CK} < 0.5$  UI.

In Fig. 11(d), the eye diagram shows that  $CK_n$  samples on  $Zone_1$ . Assume that (DH[n-1], DL[n-1]) is equal to (1, 1). When the data amplitude is below  $h_{0.BR}/2$  (blue and red part), both  $DS_1[n]$  and  $DS_2[n]$  are equal to low in Fig. 10. It results in (DH[n], DL[n]) = (0, 0) according to the DFE logic in TABLE II. When the data amplitude is between  $h_{0,BR}/2$  and 0, one can find ES[n] = 1 (blue part) in the Zone<sub>1</sub>. It indicates that the clock is Early according to the BRPD logic in III-B. When the data amplitude is lower than 0, ES[n] = 0 is obtained where the clock is *Late*. Since the larger magnitude in the red part is larger than the that in the blue part, the average output of the BRPD logic will indicate that the clock is *Late*. Similarly, when  $CK_n$  samples on the  $Zone_2$ , the BRPD logic will output more Late. Thus, when the clock samples on either  $Zone_1$  or  $Zone_2$ , the VCO's frequency will speed up to reduce  $|\Delta T_{CK}|$  until the BRPD locks. This is the reason why the proposed BRPD exhibits a single locking point, marked by the red circle in Fig. 11(c).

The transfer curve shown in Fig. 11(c) displays an asymmetrical phase characteristic. When the frequency of the sampling clock CK is lower than the baud-rate frequency, there is a higher probability of outputting Late. This tendency allows the BRPD to generate more Late outputs, subsequently increasing the frequency of the sampling clock through the clock and data recovery (CDR) circuit. This characteristic indicates that the BRPD possesses a considerable upward frequency acquisition range. However, the narrow region of outputting Early limits its downward frequency acquisition range. According to MATLAB simulation results, the simulated upward and download frequency acquisition ranges are 34% and less than 1000 ppm, respectively.

# C. Voltage Margin Comparison

Although the DFE effectively eliminates the post-cursor ISI, the pre-cursor ISI still exists. To evaluate the impact of the pre-cursor ISI on the voltage margin [11], five channel responses with losses ranging from 9.9 dB to 28.3 dB at 11.52 GHz are considered as shown in Fig. 12(a). Fig. 12(b) shows the normalized SBR with the data of 23.04 Gbaud when the channel loss of 28.3 dB at 11.52 GHz and the CTLE are used. Here, when a bang-bang phase detector (BBPD) is used,  $h_{0,BB}$  and  $h_{-1,BB}$  denote the main cursor and the first pre-cursor, respectively. When the BRPD is used,  $h_{0,BR}$ and  $h_{-1,BR}$  denote the main cursor and the first pre-cursor, respectively. Fig. 12(c) shows the normalized cursors versus channel loss using the BBPD and the BRPD, respectively. Assuming the CTLE and the DFE eliminate all post-cursors and only focus on the first pre-cursor considered, the voltage margins using the BBPD and the BRPD are approximated as

$$\begin{cases} VM_{PAM3,BB} \approx h_{0,BB} - 2h_{-1,BB} \\ VM_{PAM3,BR} \approx h_{0,BR} - 2h_{-1,BR} \end{cases}, \tag{17}$$

respectively. The normalized voltage margins are shown in Fig. 12(d), calculated by using simulated main cursors and first pre-cursors from Fig. 12(c) and (17). When the channel loss is less or greater than 17 dB, voltage margins using the BBPD



Fig. 12. (a) Five channel responses, (b) simulated normalized SBR, (c) simulated normalized cursors versus channel loss, and (d) simulated normalized voltage margins versus channel loss.

are higher or lower than those using the BRPD. In Fig. 12(c),  $h_{-1,BB}$  becomes significant in the high-loss channel, while  $h_{-1,BR}$  remains negligible. As a result, when the channel loss is high and the pre-cursor ISI is significant, the BRPD demonstrates a larger voltage margin compared to the BBPD.

# IV. QUARTER-RATE PAM-3 RECEIVER

Fig. 13 shows the proposed quarter-rate PAM-3 receiver. It consists of an inductor-reused CTLE, sixteen data slicers, four error slicers, two reference generators [12], retimers, twenty 1-to-4 DMUXes, a synthesized logic circuit, a digital-to-analog converter (DAC), an LC voltage-controlled oscillator (VCO), a quadrature (IQ) divider [13], and a divide-by-4 divider. The synthesized logic circuit is composed of a DFE logic, a BRPD logic, a digital loop filter (DLF), a binary-to-thermometer code (B2T) converter, and a PRTS checker.

In a full-rate architecture, as shown in Fig. 10, four data slicers and an error slicer are needed. Consequently, in a quarter-rate architecture, the total number of required slicers increases to twenty. To implement the data and error slicers in the quarter-rate slicer bank, a four-input StrongARM latch [14] in series with a SR latch is employed. As shown in Fig. 14, the servo loop within the reference generator regulates the output common-mode voltage  $V_{cm}$  to be equal to the common-mode voltage of the CTLE  $V_{cm,CTLE}$ . The difference between the differential output voltages,  $V_{refP}$  and  $V_{refN}$ , is controlled by adjusting the code from the synthesized logic. The two generators facilitate the realization of four differential reference voltages,  $\pm 3h_{0,BR}/2$  and  $\pm h_{0,BR}/2$ . According to the simulation results, these voltages correspond to  $\pm 196.6$  mV and  $\pm 68.7$  mV, respectively. The remaining circuits will be discussed as follows.

#### A. DFE Logic and BRPD Logic

In Fig. 15, the DFE logic is shown, consisting of 96 DFFs and 32 3-to-1 MUXes, operating with a clock frequency



Fig. 13. Proposed quarter-rate PAM-3 receiver with the one-tap loop-unrolled DFE.



Fig. 14. Reference generator.

 $(CK_{div4})$  of 1.44 GHz. The 64-bit  $DS_{1-4}$  [15:0] from the DMUXes undergoes retiming using 64 DFFs in alignment with  $CK_{div4}$ . According to TABLE II, the recovered data DH[15:0] and DL[15:0] are selected from  $DS_{1-4}$  [15:0] by using 32 3-to-1 MUXes and 96 DFFs. A critical path exists, marked by the red dashed line in Fig. 14. The delay on this path must be shorter than the period of  $CK_{div4}$ , expressed as

$$T_{CKQ} + 16T_{MUX} + T_{setup} < 16UI$$
 (18)

where  $T_{CKQ}$  denotes the clock-to-Q delay of the DFF,  $T_{MUX}$  denotes the propagation delay of the 3-to-1 MUX,  $T_{setup}$  denotes the setup time of the DFF, and the period of  $CK_{div4}$  is equal to 16UI (= 0.69 ns). Although a look-ahead logic [5], could reduce the delay of the critical path to  $T_{CKQ} + 5T_{MUX} + T_{setup}$ , it needs 96 DFFs and 100 3-to-1 MUXes. According to the simulation results, both  $T_{CKQ}$  and  $T_{setup}$  are equal to 0.01 ns and  $T_{MUX}$  is 0.03 ns. Then, the proposed DFE logic meets the timing requirement of (18). And, both the hardware and the power can be reduced.

According to TABLE III, the BRPD logic utilizes ES[15:0], DH[15:0], and DL[15:0] generate the Late/Early signal.



Fig. 15. DFE logic.

# B. DLF and LCVCO

According to Late/Early of the BRPD logic, a first-order DLF produces a 5-bit coarse code  $DLF_{coarse}$  [4:0] and a 7-bit fine code  $DLF_{fine}[6:0]$ . Then,  $DLF_{fine}[6:0]$  is converted into a 127-bit thermometer code B2T[126:0] by using the B2T converter. Fig. 16(a) shows the DAC, composed of a resistor ladder, 254 switches, and a capacitor. The thermometer code B2T[126:0] controls the DAC to generate the control voltage  $V_{DAC}$ . Fig. 16(b) shows the LCVCO with two varactors controlled by  $V_{DAC}$ . The switched-capacitor bank of the LCVCO is controlled by  $DLF_{coarse}$  [4:0]. The LCVCO's frequency ranges from 7.6 GHz to 11.8 GHz to generate the differential clocks, CK and  $\overline{CK}$ . The simulated phase noise is -106.4 dBc/Hz at an offset frequency of 1 MHz, and the simulated gain is 340 MHz/V. The quadrature clocks  $CK_0$ ,  $CK_{90}$ ,  $CK_{180}$ , and  $CK_{270}$  are generated by using CK, CK, and the IQ divider.



Fig. 16. (a) DAC, and (b) LCVCO.



Fig. 17. PRTS-7 generator.



Fig. 18. PRTS checker.

# C. PRTS Checker

The PRTS generator of  $3^7-1$  [2] is shown in Fig. 17, composed of seven registers, a 2-bit adder, and a 2-bit multiplier, and a modulo-3 operator. The PAM-3 symbol  $S_k$  is generated by the recursive equation as

$$S_k = S_{k-2} + 2 \cdot S_{k-7} \tag{19}$$

where k denotes the time index. Fig. 18 shows the PRTS checker [15], composed of a selector, a decoder, a 2-bit multiplier, a 2-bit adder, a 2-bit exclusive-or (XOR) gate, and a toggle flip-flop (TFF). The 32-bit recovered data DH[15:0] and DL[15:0] from the DFE logic are processed through the selector and the decoder. Three symbols  $D_{k-7}$ ,  $D_{k-2}$  and  $D_k$ , are decoded, where  $k=0\sim31$  as shown in Fig. 17. The negative time indices represent the recovered data from the previous clock cycle, stored in the selector. Conceptually, the received symbol,  $D_k$ , should be identical to  $D_{k-2}+2\cdot D_{k-7}$  for errorfree data. To verify this equivalence, the XOR gate is used, and its output,  $XOR_o$ , is given by

$$XOR_o = \begin{cases} 1, & if \ D_k \neq D_{k-2} + 2 \cdot D_{k-7} \\ 0, & otherwise. \end{cases}$$
 (20)



Fig. 19. Measurement setup.





Fig. 20. (a) Die photo, and (b) power breakdown.



Fig. 21. Measured eye diagram of 23.04 Gbaud PAM-3 signaling (a) before, and (b) after the ISI board.

When  $XOR_o$  remains low,  $Check_{PRTS}$  keeps the previous result. By measuring the time interval where  $Check_{PRTS}$  remains unchanged, the BER of the recovered data can be estimated.

#### V. EXPERIMENTAL RESULTS

The measurement setup in shown in Fig. 19. The pattern generator (Keysight M8040A) generates the differential 23.04 Gbaud PRTS of 3<sup>7</sup>-1. The M8049A ISI board is used as the channel and its measured frequency response is shown in Fig. 11(a) with the channel loss of 20.5 dB at 11.52 GHz. A field-programmable gate array (FPGA)

| Reference            | [11]<br>JSSC'23     | [16]<br>JSSC'17     | [17]<br>ISSCC'19    | [18]<br>ISSCC'19    | [19]<br>JSSC'22     | [20]<br>ISSCC'23    | This<br>work        |
|----------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
| Technology (nm)      | 40                  | 65                  | 28                  | 7                   | 40                  | 28                  | 22                  |
| Modulation           | PAM-4               | NRZ                 | NRZ                 | PAM-4               | PAM-4               | PAM-4               | PAM-3               |
| Data-rate (Gb/s)     | 48                  | 40                  | 36                  | 56                  | 56                  | 52                  | 36                  |
| Channel Loss<br>(dB) | 4                   | 16                  | 18                  | 17.8                | 10                  | 7.1                 | 20.5                |
| Equalizer            | 1-tap DFE           | TX-FFE<br>CTLE      | CTLE<br>1-tap DFE   | TX FIR<br>CTLE      | CTLE<br>1-tap DFE   | CTLE                | CTLE<br>1-tap DFE   |
| BER                  | < 10 <sup>-11</sup> | < 10 <sup>-12</sup> |
| Power (mW)           | 116.3               | 225                 | 106.3               | 79                  | 278                 | 43.1                | 51.7                |
| Core Area (mm²)      | 0.24                | 1.92*               | 1.23*               | 0.13                | 0.33                | 0.011               | 0.25                |
| Efficiency (pJ/b)    | 2.42                | 5.62                | 3.04                | 1.41                | 4.96                | 0.83                | 1.44                |
| FoM** (pJ/b/dB)      | 0.60                | 0.35                | 0.16                | 0.08                | 0.49                | 0.11                | 0.07                |

TABLE IV
PERFORMANCE SUMMARY AND COMPARISON

FoM\*\* = Power/Data-rate/Channel Loss.



Fig. 22. Measured output of PRTS checker when (a) BER is high, and (b) BER is low.



Fig. 23. Measured eye diagram of deserialized by 4 recovered data at 1.44 Gb/s.

generates the control signals for the device under test (DUT).  $Check_{PRTS}$  is monitored by using the MDO3104 oscilloscope. The deserialized-by-4 recovered  $D_{out}$  and the divide-by-8 recovered clock  $CK_{out}$  are measured by using the Keysight MSOV334A real-time oscilloscope and the R&S FSWP phase noise analyzer.

The proposed PAM-3 receiver is fabricated in 22 nm CMOS technology. The die photo is shown in Fig. 20(a) with a core area of 0.25 mm<sup>2</sup>. The total power consumption of 51.7 mW, excluding the output buffers. The power distribution is shown in Fig. 20(b).



Fig. 24. Measured eye diagram of the divide-by-16 recovered clock at 720MHz.



Fig. 25. Measured phase noise of the divide-by-16 recovered clock of 720MHz at frequency offset of 100 Hz -100MHz.

Fig. 21(a) shows the measured eye diagram of 23.04 Gbaud PAM-3 signaling with a 1 Vpp differential amplitude before the ISI board. The measured eye diagram shows a raw PAM-3 signaling input without using the doubinary encoding. Fig. 21(b) shows the measured eye diagram after the ISI board. The BER is estimated by measuring  $Check_{PRTS}$ . Fig. 22(a) shows that  $Check_{PRTS}$  is toggling when the BER is high. Fig. 20(b) shows that  $Check_{PRTS}$  maintains a constant

<sup>\*:</sup> Total Area





Fig. 26. Measured (a) jitter transfer function, and (b) simulated and measured jitter tolerance.

level for a duration longer than 400 s. Then, the BER is estimated to be less than  $10^{-12}$  for the data-rate of 36Gb/s. The eye diagrams of the deserialized-by-4 recovered data and the divide-by-16 recovered clock are shown in Fig. 23 and Fig. 24, respectively. The measured rms and peak-to-peak jitter of the deserialized-by-4 recovered data and the divideby-16 recovered clock are 1.40 ps / 10.98 ps and 0.996 ps / 8.55 ps, respectively. The ripple observed on both logic 0 and logic 1 signals may be attributed to the power supplies for the output drivers and the bonding wires. The lack of synchronization between these two signals can lead to supply disturbances, particularly when one signal transitions while the other remains static. Additionally, the substantial size of the output drivers, which generate a significant load current, could contribute to the supply noise through the bonding wires. Fig. 25 shows the measured phase noise of the divide-by-16 recovered clock. The integrated rms jitter from 100 Hz to 100 MHz is 267 fs.

Fig. 26(a) plots the measured jitter transfer function. For the sinusoid jitter with the amplitude of 0.1 UI, the -3 dB loop bandwidth is around 15 MHz. The measured jitter tolerance (JToL) is plotted in Fig. 26(b) with a tolerance of 0.07 UI<sub>pp</sub> at the modulation frequency of 100 MHz. The corner frequency is around 15 MHz. The simulated jitter tolerance is conducted in MATLAB for quick verification. TABLE IV shows the performance summary and the comparison of this work and the prior arts. The proposed PAM-3 receiver achieves the energy efficiency of 1.44 pJ/b and the figure-of-merit (FoM) of 0.07 pJ/b/dB. In comparison to [16] and [17], this work achieves a better energy efficiency with similar data-rate and channel losses. Additionally, when compared to [11], [18], [19], and [20], which operate within the twenties baud-rate range, aligning with the scope of this work, our approach can operate effectively under a larger channel loss while simultaneously achieving a good FoM.

# VI. CONCLUSION

The 36 Gb/s PAM-3 receiver is presented. The proposed inductor-reused CTLE introduces a high-pass feedforward path to boost the high-frequency gain. A capacitor array is added

to tune the high-frequency gain while maintaining the low-frequency gain. The 1+D approach is proposed to reduce the number of slicers in loop-unrolled DFE. A BRPD is presented to avoid false-locking points in baud-rate sampling. Lastly, this receiver achieves a good energy efficiency and FoM.

# ACKNOWLEDGMENT

The authors would like to thank the support by Genesys Logic Inc., Intelligent and Sustainable Medical Electronics Research Fund in National Taiwan University, and the National Science and Technology Council, Taipei, Taiwan.

#### REFERENCES

- [1] H. Park, J. Song, Y. Lee, J. Sim, J. Choi, and C. Kim, "A 3-bit/2UI 27Gb/s PAM-3 single-ended transceiver using one-tap DFE for next-generation memory interface," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 382–384.
- [2] USB4 Specification V2.0. Accessed: Sep. 2024. [Online]. Available: https://www.usb.org
- [3] S. Gondi and B. Razavi, "Equalization and clock and data recovery techniques for 10 Gb/s CMOS serial-link receivers," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 1999–2011, Sep. 2007.
- [4] T.-C. Huang, T.-W. Chung, C.-H. Chern, M.-C. Huang, C.-C. Lin, and F.-L. Hsueh, "A 28 Gb/s 1 pJ/b shared-inductor optical receiver with 56% chip-area reduction in 28 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 144–145.
- [5] A. Cevrero et al., "A 100 Gb/s 1.1 pJ/b PAM-4 RX with dual-mode 1-tap PAM-4/3-tap NRZ speculative DFE in 14 nm CMOS FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 112–114.
- [6] D. Kim et al., "A 12-Gb/s 10-ns turn-on time rapid on/off baud-rate DFE receiver in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 55, no. 8, pp. 2196–2205, Aug. 2020.
- [7] Analog Library Reference Product Version IC6.1.8, Cadence Des. Syst., San Jose, CA, USA, Oct. 2020.
- [8] T. Norimatsu, K. Kogo, T. Komori, N. Kohmu, F. Yuki, and T. Kawamoto, "A 100 Gbps 4-lane transceiver for 47 dB loss copper cable in 28 nm CMOS," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 10, pp. 3433–3443, Oct. 2020.
- [9] K. Mueller and M. Müller, "Timing recovery in digital synchronous data receivers," *IEEE Trans. Commun.*, vol. COM-24, no. 5, pp. 516–531, May 1976.
- [10] F. Tachibana et al., "A 56-Gb/s PAM4 transceiver with false-lock-aware locking scheme for Mueller-Müller CDR," in *Proc. IEEE 48th Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2022, pp. 505–508.
- [11] W. Jung, K. Lee, K. Park, H. Ju, J. Lee, and D.-K. Jeong, "A 48 Gb/s PAM-4 receiver with pre-cursor adjustable baud-rate phase detector in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 58, no. 5, pp. 1414–1424, May 2023.

- [12] M.-S. Chen, Y.-N. Shih, C.-L. Lin, H.-W. Hung, and J. Lee, "A fully-integrated 40 Gb/s transceiver in 65 nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 47, no. 3, pp. 627–640, Mar. 2012.
- [13] A. A. Hafez and C. K. Yang, "Analysis and design of superharmonic injection-locked multipath ring oscillators," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 7, pp. 1712–1725, Jul. 2013.
- [14] B. Razavi, "The StrongARM latch," IEEE Solid-State Circuits Mag., vol. 7, no. 2, pp. 12–17, Jun. 2015.
- [15] J.-E. Lin, Y.-H. Lan, and S.-I. Liu, "A 40 Gb/s PAM-3 receiver with modified summer-merged slicers and PRTS checker," *IEEE Trans.* Very Large Scale Integr. (VLSI) Syst., vol. 32, no. 8, pp. 1512–1522, Aug. 2024.
- [16] X. Zheng et al., "A 40 Gb/s quarter-rate SerDes transmitter and receiver chipset in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 11, pp. 2963–2978, Nov. 2017.
- [17] D. Yoo, M. Bagherbeik, W. Rahman, A. Sheikholeslami, H. Tamura, and T. Shibasaki, "A 36 Gb/s adaptive baud-rate CDR with CTLE and 1-Tap DFE in 28nm CMOS," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 126–128.
- [18] S. Shahramian et al., "A 1.41 pJ/b 56 Gb/s PAM-4 wireline receiver employing enhanced pattern utilization CDR and genetic adaptation algorithms in 7nm CMOS," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 482–484.
- [19] P.-J. Peng et al., "A 56 Gb/s PAM-4 transmitter/receiver chipset with nonlinear FFE for VCSEL-based optical links in 40 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 57, no. 10, pp. 3025–3035, Oct. 2023.
- [20] S. Park et al., "A 0.83 pJ/b 52 Gb/s PAM-4 baud-rate CDR with pattern-based phase detector for short-reach applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2023, pp. 118–120.



**Pin-Yuan Chiu** was born in New Taipei City, Taiwan, in 1999. He received the B.S. and M.S. degrees in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 2021 and 2024, respectively.

He joined MediaTek, Hsinchu, Taiwan, in 2024. His research interests include high-speed communication systems and mixed-signal IC design.



**Shen-Iuan Liu** (Fellow, IEEE) was born in Keelung, Taiwan, in 1965. He received the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 1987 and 1991, respectively.

From 1991 to 1993, he was a Second Lieutenant with Chinese Air Force, Taichung, Taiwan. From 1991 to 1994, he was an Associate Professor with the Department of Electronic Engineering, National Taiwan Institute of Technology, Taipei. He joined the Department of Electrical Engineering.

NTU, in 1994, where he has been a Professor, since 1998, and a Distinguished Professor, since August 2010. He was the Director of the Graduate Institute of Electronics Engineering, NTU, from 2013 to 2016. His research interests include analog and digital integrated circuits and systems. He served as a Technical Program Committee Member for ISSCC from 2006 to 2008, IEEE VLSI-DAT from 2008 to 2012, and A-SSCC from 2005 to 2012. He was a recipient of the Engineering Paper Award from Chinese Institute of Engineers in 2003, the Young Professor Teaching Award from MXIC Inc., the Research Achievement Award from NTU, the Outstanding Research Award from National Science Council in 2004, the Outstanding Research Award from Ministry of Science and Technology in 2014, and the Best Paper Awards at the 2020 and 2021 International Symposium on VLSI Design, Automation and Test and the 2023 International VLSI Symposium on Technology, Systems and Applications. He achieved the Teaching Excellence Award from NTU in 2022. He was awarded the Himax Chair Professorship at NTU in 2010. He served as the Chair for the IEEE SSCS Taipei Chapter from 2004 to 2008, which achieved the Best Chapter Award in 2009. He was awarded the Academic Contribution Award from the College of Electrical Engineering and Computer Science in 2023. He was awarded one of the top contributors of A-SSCC in 2024. He served as the General Chair for the 15th VLSI Design/CAD Symposium, Taiwan, in 2004, and the Program Co-Chair for the Fourth IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, Fukuoka, Japan, in 2004. He also served as the Technical Program Committee Co-Chair and the Chair for A-SSCC from 2010 and 2011. He was an Associate Editor of IEEE JOURNAL OF SOLID-STATE CIRCUITS from 2006 to 2009 and the Guest Editor of IEEE JOURNAL OF SOLID-STATE CIRCUITS Special Issue from December 2008 to November 2012. He was an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS from 2006 to 2007 and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS from 2008 to 2009. He was on the Editorial Board of Research Letters in Electronics from 2008 to 2009. He was an Associate Editor of Institute of Electronics, Information and Communication Engineers (IEICE) and WSEAS Transactions on Electronics from 2008 to 2011. He has been an Associate Editor of ETRI Journal and the Journal of Semiconductor Technology and Science, South Korea, since 2009.